Search CORE

17 research outputs found

Masked Conditional Neural Networks for Sound Recognition

Author: Medhat Fady
Publication venue: University of York
Publication date: 01/03/2018
Field of study

Sound recognition has been studied for decades to grant machines the human hearing ability. The advances in this field help in a range of applications, from industrial ones such as fault detection in machines and noise monitoring to household applications such as surveillance and hearing aids. The problem of sound recognition like any pattern recognition task involves the reliability of the extracted features and the recognition model. The problem has been approached through decades of crafted features used collaboratively with models based on neural networks or statistical models such as Gaussian Mixtures and Hidden Markov models. Neural networks are currently being considered as a method to automate the feature extraction stage together with the already incorporated role of recognition. The performance of such models is approaching handcrafted features. Current neural network based models are not primarily designed for the nature of the sound signal, which may not optimally harness distinctive properties of the signal. This thesis proposes neural network models that exploit the nature of the time-frequency representation of the sound signal. We propose the ConditionaL Neural Network (CLNN) and the Masked ConditionaL Neural Network (MCLNN). The CLNN is designed to account for the temporal dimension of a signal and behaves as the framework for the MCLNN. The MCLNN allows a filterbank-like behaviour to be embedded within the network using a specially designed binary mask. The masking subdivides the frequency range of a signal into bands and allows concurrent consideration of different feature combinations analogous to the manual handcrafting of the optimum set of features for a recognition task. The proposed models have been evaluated through an extensive set of experiments using a range of publicly available datasets of music genres and environmental sounds, where they surpass state-of-the-art Convolutional Neural Networks and several hand-crafted attempts

White Rose E-theses Online

Environmental Sound Recognition using Masked Conditional Neural Networks

Author: Medhat Fady
Chesmore David
Robinson John
Publication venue
Publication date: 30/10/2010
Field of study

Neural network based architectures used for sound recognition are usually adapted from other application domains, which may not harness sound related properties. The ConditionaL Neural Network (CLNN) is designed to consider the relational properties across frames in a temporal signal, and its extension the Masked ConditionaL Neural Network (MCLNN) embeds a filterbank behavior within the network, which enforces the network to learn in frequency bands rather than bins. Additionally, it automates the exploration of different feature combinations analogous to handcrafting the optimum combination of features for a recognition task. We applied the MCLNN to the environmental sounds of the ESC-10 dataset. The MCLNN achieved competitive accuracies compared to state-of-the-art convolutional neural networks and hand-crafted attempts.Comment: Boltzmann Machine, RBM, Conditional RBM, CRBM, Deep Neural Network, DNN, Conditional Neural Network, CLNN, Masked Conditional Neural Net-work, MCLNN, Environmental Sound Recognition, ESR, Advanced Data Mining and Applications (ADMA) Year: 201

arXiv.org e-Print Archive

Revistas y Boletines - Banco de la República

Masked Conditional Neural Networks for sound classification

Author: Chesmore David
Medhat Fady
Robinson John Allen
Publication venue: 'Elsevier BV'
Publication date: 01/05/2020
Field of study

The remarkable success of deep convolutional neural networks in image-related applications has led to their adoption also for sound processing. Typically the input is a time–frequency representation such as a spectrogram, and in some cases this is treated as a two-dimensional image. However, spectrogram properties are very different to those of natural images. Instead of an object occupying a contiguous region in a natural image, frequencies of a sound are scattered about the frequency axis of a spectrogram in a pattern unique to that particular sound. Applying conventional convolution neural networks has therefore required extensive hand-tuning, and presented the need to find an architecture better suited to the time–frequency properties of audio. We introduce the ConditionaL Neural Network (CLNN)1 and its extension, the Masked ConditionaL Neural Network (MCLNN) designed to exploit the nature of sound in a time–frequency representation. The CLNN is, broadly speaking, linear across frequencies but non-linear across time: it conditions its inference at a particular time based on preceding and succeeding time slices, and the MCLNN use a controlled systematic sparseness that embeds a filterbank-like behavior within the network. Additionally, the MCLNN automates the concurrent exploration of several feature combinations analogous to hand-crafting the optimum combination of features for a recognition task. We have applied the MCLNN to the problem of music genre classification, and environmental sound recognition on several music (Ballroom, GTZAN, ISMIR2004, and Homburg), and environmental sound (Urbansound8K, ESC-10, and ESC-50) datasets. The classification accuracy of the MCLNN surpasses neural networks based architectures including state-of-the-art Convolutional Neural Networks and several hand-crafted attempts

White Rose Research Online

TMIXT: A process flow for Transcribing MIXed Handwritten and Machine-printed Text

Author: Breckon Toby
Jaf Sardar
McGough Stephen
Medhat Fady
Mohammadi Mahnaz
Obara Boguslaw
Theodoropoulos Georgios
Willcocks G. Chris
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 10/12/2018
Field of study

Sunderland University Institutional Repository

Domain-specific languages for the design, deployment and manipulation of heterogeneous databases

Author: Di Ruscio Davide
Kolovos Dimitrios
Medhat Fady
Paige Richard
Scholze Sebastian
Van Der Storm Tijs
Zolotas Athanasios
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/05/2019
Field of study

The need for levels of availability and scalability beyond those supported by relational databases has led to the emergence of a new generation of purpose-specific databases grouped under the term NoSQL. In general, NoSQL databases are designed with horizontal scalability as a primary concern and deliver increased availability and fault tolerance at a cost of temporary inconsistency and reduced durability of data. To balance the requirements for data consistency and availability, organisations increasingly migrate towards hybrid data persistence architectures comprising both relational and NoSQL databases. The consensus is that this trend will only become stronger in the future; critical data will continue to be stored in ACID (largely relational) databases while non-critical data will be progressively migrated to high-availability NoSQL databases. Designing and deploying a hybrid data persistence architecture that involves a combination of relational and NoSQL databases is a complex, technically challenging and error-prone task. In this paper we outline a model-based methodology developed in the context of the EC-funded H2020 TYPHON project for designing, developing, querying and evolving such scalable architectures for persistence, analytics and monitoring of large volumes of hybrid (relational, graph-based, document-based, natural language, etc.) data, in a systematic and disciplined manner

Crossref

CWI's Institutional Repository

White Rose Research Online